Generalized) Post Correspondence Problem and 

semi-Thue systems 



Frangois Nicolas* 
November 12, 2008 



Abstract 

Let PCP(fc) denote the following restriction of the well-known Post Correspon- 
dence Problem [10]: given alphabet S of cardinality k and two morphisms a, 
r : S* — >■ {0,1}*, decide whether there exists w € E"*" such that a{'w) = t{'w). 
Let Accessibility(A;) denote the following restriction of the accessibility problem 
for semi-Thue systems: given a /c-rule semi-Thue system T and two words u and 
V, decide whether v is derivable from u modulo T. In 1980, Claus showed that if 
Accessibility(/c) is undecidable then PCP(/c + 4) is also undecidable [2|. The aim 
of the paper is to present a clean, detailed proof of the statement. 

We proceed in two steps, using the Generalized Post Correspondence Problem [3] 
as an auxiliary. Let GPCP(A;) denote the following restriction of GPCP: given an 
alphabet S of cardinality k, two morphisms a, r : S* — > {0, 1}* and four words s, 
t, s', t' S {0, 1}*, decide whether there exists w G S* such that sa{w)t = s'T{w)t' . 
First, we prove that if AcCESSiBiLiTY(/i;) is undecidable then GPCP(A; + 2) is also 
undecidable. Then, we prove that if GPCP(/c) is undecidable then PCP(/c + 2) is 
also undecidable. (The latter result can also be found in [7J.) 

To date, the sharpest undecidability bounds for both PGP and GPCP have been 
deduced from Glaus's result: since Matiyasevich and Senizergues showed that Ac- 
CESSIBILITy(3) is undecidable [9], GPCP(5) and PGP(7) are undecidable. 



1 Introduction 

A word is a finite sequence of letters. The empty word is denoted by e. For every word w, 
the length of w is denoted by A set of words is called a language. Word concatenation 
is denoted multiplicatively. For every language L, L"*" denotes the closure of L under 
concatenation, and L* denotes the language L"*" U {e}. An alphabet is a finite set of letters. 
For every alphabet S, equals the set of all non-empty words over S, and S* equals the 
set of all words over E including the empty word. 
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Let X and y be two words. We say that x is a prefix (resp. suffix) of y if there exists a 
word z such that xz = y (resp. zx = y). A prefix (resp. suffix) of y is called proper if it is 
distinct from y. We say that x occurs in y if there exists a word z such that zx is a prefix 
of y. The number of occurrences of a; in y is denoted by \y\^: \y\^ equals the number of 
words z such that zx is a prefix of y. 



1.1 The (Generalized) Post Correspondence Problem 

Let S and A be alphabets. A function a : T,* —>■ A* is called a morphism if o'{xy) = 
a{x)a{y) for every x, y G S*. Note that any morphism maps the empty word to itself. 
Moreover, for every function o"i : S ^ A*, there exists exactly one morphism cr : S* ^ A* 
such that a {a) = cri{a) for every a G S. Hence, although the set of all functions from S* 
to A* has the power of the continuum, the restriction of o" to S provides a finite encoding 
of a for every morphism a : S* A*. From now on such encodings are considered as 
canonical. 

The well-known Post Correspondence Problem (PCP) (TU] can be stated as follows: 
given an alphabet S and two morphisms a, r : S* — {0, 1}*, decide whether there exists 
w G S+ such that a{w) = t{w). For each integer k > 1, PCP{k) denotes the restriction of 
PCP to instances (S, a, r) such that S has cardinality k. 

The Generalized Post Correspondence Problem (GPCP) [4j is: given an alphabet S, 
two morphisms a, t : J]* —>■ {0, 1}* and four words s, t, s', t' G {0, 1}*, decide whether 
there exists w G S* such that sa{w)t = s'T{w)t'. Note that if st = s't' then e is a feasible 
solution of GPCP on (S, cr, r, s, t, s', t'), while all feasible solutions of PCP are non-empty 
words. 

Remark 1. For every instance (E, cr, r) of PCP, (S,o", r) is a yes-instance of PCP if and 
only if there exists a G E such that (S, a, r, cr(a), £, r(a), e) is a yes-instances GPCP. 

For each integer /c > 1, GPCP(fc) denotes the restriction of GPCP to instances 
(S, 0", r, s, t, s', t') such that S has cardinality k. 



1.2 Semi-Thue systems 

Formally, a semi-Thue system is a pair T = (S, i?), where S is an alphabet and where R is 
a subset of S* x E*. The elements of R are called the rules of T. For every x, y & S*, we 
say that y is immediately derivable from x modulo T, and we write x i — *t y, if there exist 
s, t, z, z' G S* such that x = zsz', y = ztz' and (s,t) G -R. For every m, f G S*, we say 
that u is derivable from f modulo T, and we write u \-^x v, if there exist an integer n > 
and Xo, Xi, . . . , x„ G S* such that Xq = m, x„ = f , and Xi_i i — >-t 2;i for every i G [1, n]: 

« = Xn I — »■ Xi I — »• X2 I — »■ ■ ■ ■ I — »■ x„ = w . (1) 

^ ' 

In other words, i-^^" is the reflexive-transitive closure of the binary relation i — Define 
the Accessibility problem as: given a semi-Thue system T and two words u and v over 
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the alphabet of T, decide whether u h-^^ v. For every integer k > 1, define AcCES- 
sibility(A;) as the restriction of Accessibility to instances {T,u,v) such that T has k 
rules. 

1.3 Decidability 

Let k he a positive integer. The decidabilities of Accessibility, PCP and GPCP are 
linked by the following four facts. 

Fact 1. // GPCP(/c) is decidable then PCP(fc) is decidable. 

Fact 2. // GPCP(/c + 2) is decidable then Accessibility(/c) is decidable. 

Fact 3. // PCP (A; + 2) zs decidable then GPCP(A:) zs decidable. 

Fact 4 (Claus's theorem). // PCP(A; + 4) is decidable then AcCESSlBlLlTY(fc) is decidable. 

Fact [1] follows from Remark [H Fact [3] is [3 Theorem 3.2], and Fact H] was originally 
stated by Glaus [21 Theorem 2] (see also [6l[8l[7]). To our knowledge. Fact [2] is explicitly 
stated for the first time in the present paper. 

Remark 2. The conjunction of Facts\^and\^ yields Fact^ 

Since Matiyasevich and Senizergues have shown that Accessibility(3) is undecidable 
[9l Theorem 4.1], it follows from Fact H] that PGP(7) is undecidable ^ Gorollary 1]. In 
the same way Fact [2] yields that GPGP(5) is undecidable (see also [6l Theorem 7]). Those 
results are the sharpest to date. Indeed, the decidability of each of the following eight 
problems is still open: 

• Accessibility(I), Accessibility(2), 

• GPGP(3), GPGP(4), 

• PGP(3), PGP(4), PGP(5) and PGP(6). 

However, Ehrenfeucht and Rozenberg showed that PGP (2) and GPGP(2) are decidable [1] 
(see also [31 E]). 

1.4 Organization of the paper 

The aim of the paper is to present a clean, detailed proof of Fact HI We start in Section [2] 
with some technicalities concerning Accessibility. Then, Fact [2] is proved in Section [3], 
and Fact [3] is proved in Section HI 
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2 More on the decidability of Accessibility 



Definition 1. The language {010"101 : n > 2} is denoted by C . For each integer k > 1, 
define Ck as the set of all instances (T, u, v) of Accessibility such that u, v E C* and 
T = ({0, 1} , R) for some k-element subset R C x . 

The essential properties of tlie gadget language C are: C is an infinite, binary, comma- 
free code (see Definitions [5] and [6] below), and no word in C overlaps the delimiter word 
0011. 

The aim of this section is to show: 

Proposition 1. For every integer k > 1, the general Accessibility(/c) problem is decid- 
able if and only if its restriction to Ck is decidable. 

The idea to prove Proposition [T] is to construct a many-one reduction based on the 
following gadget transformation: 

Definition 2. Let T = (T,,R) be a semi-Thue system, let A be an alphabet, and let 
a : S* — s> A*. Define the image ofT under a, denoted a{T), as the semi-Thue system over 
A with rule set {(a(s),a(t)) : {s,t) G R}. 

The next two lemmas are straightforward. 

Lemma 1. Let S and S be alphabets, let T be a semi-Thue system over S, let T be a 
semi-Thue system over S, and let a : S* ^ S* be such that for every x, y E S*, x i — V 
implies a{x) i — yf a{y). For every u, v E S*, u i — v implies a{u) i — a{y). 

Proof. Assume that u i— U-^ v. there exist an integer n > and n + 1 words Xq, Xi, . . . , a;„ 
over S such that Equation ([T]) holds. It follows 

a{u) = a{xo) >• a{xi) i— -4- a{x2) i— 7>- ■ ■ ■ i— 7>- a(a;„) = a{v) , (2) 
f f T T 

and thus a{u) t-^f a{v). □ 

Lemma 2. In the notation of Definition if a is a morphism then for every u, v E S*, 
u I — V implies a{u) i — *a{T) 

Proof. We apply Lemma [1] with T := a{T). Let x, ?/ G S* be such that x i — *t V- there 
exist s, t, z' G S* such that x = zsz', y = ztz' and G R. Since a is a morphism, 
a{x) and a{y) can be parsed as follows: a{x) = a{z)a{s)a{z'), a{y) = a{z)a(t)a{z') and 
(a(s),a(t)) is a rule of a(T). Hence, we get that a{x) i — *-aiT) «(?/)• CH 

Definition 3. Let {s,t) be a rule of some semi-Thue system: {s,t) is a pair of words. 
We say that {s,t) is an insertion rule if s = e. We say that {s,t) is a deletion rule if 
t = e. A semi-Thue system is called e-free if it has neither insertion nor deletion rule. By 
extension, an instance {T,u,v) of Accessibility is called e-free if the semi-Thue system 
T is e-free. 
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Note that every instance of Accessibility(A;) that belongs to Ck is e-free. The next 
two gadget morphisms play crucial roles in the proofs of both Lemma [3] and Theorem [2] 
below. 

Definition 4. Let S be an alphabet and let d be a letter. Define Xd and as the morphisms 
from to (S U {d})* given by: Xd{a) := da and Pd{a) '■= ad for every a G S. 

For instance, Ad(OllOl) and pd(OllOl) equal dOdldldOdl and OdldldOdld, respectively. 

Lemma 3. For every integer k > 1, Accessibility(A;) is decidable if and only if the 
problem is decidable on e-free instances. 

Proof. We present a many-one reduction from AcCESSIBILITY(A;) in its general form to 
AcCESSlBlLlTY(fc) on e-free instances. 

Let {T,u,v) be an instance of Accessibility(A;). Let E denote the alphabet T, let 
d he a. symbol such that d ^ T,, and let /i : S* — > (S U {d})* be defined by: p{w) := 
\d{w)d = dpdiw) for every w G S*. Clearly, {p(T) , p{u) , p{v)) is an e-free instance of 
Accessibility(/c), and {p{T) , p{u) , p{v)) is computable from {T,u,v). 

It remains to check the correctness statement: u i — >t v if and only if p{u) i — *-fj,{T) f^i^)- 

{only if). Let x, y E T,* be such that x i — *t V- there exist s, t, z, z' G such that 
X = zsz', y = ztz' and (s, t) is a rule of T. Clearly, ji^x) and pijj) can be parsed as follows: 
p{x) = Xd{z)p{s)pd{z'), p{y) = Xd{z)p{t)pd{z') and (/i(s), /i(t)) is a rule of p{T). Hence, 
we get that p{x) i — ^^{t) l^iy)- It now follows from Lemma [1] (applied with a := n and 
T := p{T)) that u i-^j- v implies p{u) i— -^^(t) /^("w)- 

{if), let yU : (S U {d})* S* denote the morphism defined by: p{a) := a for every a G S 
and ^{d) := e. It is clear that p{p{w)) = w for every w G S*, and thus T = p{p{T)). 
Hence, for every -u, -0 G (S U {d})*, u i— -^^(t) implies p{u) \r^T At(-O) by Lemma [2] 
(applied with a := and T := p{T)). In particular, /i(-u) i— -♦■/^(r) At('i^) implies m = 
p{p{u)) KK'")) = V. 

□ 

Given an alphabet E, a semi-Thue system T over S, and a subset L C S*, we say that 
L is closed under derivation modulo T if for every x E L and every y G S*, x i — *t y 
implies y E L. The next lemma is an ad hoc counterpart of Lemma [TJ 

Lemma 4. Let S and S be alphabets, let T be a semi-Thue system over S, let T be a 
semi-Thue system over S, and let a : S* ^ S* be such that: 

{i) the range of a is closed under derivation modulo T , and 

{a) for every x, y E T,* , a{x) i — >-f a{y) implies x i — *t U- 

For every u, v E S*, a{u) ^-^f a{v) implies u \-^t v. 
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Proof. Assume that a{u) i-^f a{v): there exist an integer n > and n + 1 words xq, xi, 
. . . , Xn over S such that 

a{u) = Xq ^—^ Xi H— 4- X2 I — *■ ■ ■ ■ I — *■ in = a(f ) . 

T f f f 

It follows from point (i) that Xi belongs to the range of a for every i G [0, n]: let Xq := u, let 
Xn '■= V, and for each i G [1, n — 1], let Xi G be such that Xi = a{xi). Now, Equation ([2]) 
holds, and thus Equation ([1]) follows by point (ii). We have thus shown that u t-^j' v. □ 

Surprisingly, hypothesis (z) of Lemma S] is not disposable. Indeed, let T = (S,-R) be a 
semi-Thue system, and let Mq, fo G be such that vq: a trivial choice for R is the 

empty set. Let a be a symbol such that a ^ T., and let T := (S U {a}, i? U {(mq, a), (a, "Wo)})- 
For every x, ?/ G S*, x " — *t y is equivalent to x i — *f y. However, is not closed under 
derivation modulo T, and u i-^^ f does not imply u i— v for every m, f G S*, since 

Definition 5. Let X be a language. We say that X is a code if the property 

X1X2 ■■■Xm= ym ■■■yn [Xl, X2,..., Xm) = (j/l, 1/2, • • • , Z/n) 

holds for any integers m, n > 1 and any elements Xi, X2, . . . , Xm, yi, y2, ■ ■ ■ , yn ^ X . 

Note that (xi, X2, . . . , Xm) = (z/i, I/2, • • • , yn) means that both m = n and Xi = yi for 
every i G [l,m]. In other words, a language X is a code if each word in X* has a unique 
factorization over X. A morphism a : S* — > A* is injective if and only if a is injective on 
S and is a code. 

Definition 6 ([I]). A code X is called comma-free if for every words x, z and z' , 

(x G X and zxz G X*) ^> {z G X* and z G X*) . 

Every comma-free code is a hifix code: no word in the language is a prefix or a suffix of 
another word in the language. For instance, K := {10"1 : n > 1} and C are comma-free 
codes, but JCU{ll}isa bifix code which is not comma-free. 

Lemma 5. In the notation of Definition^ assume that 

(i) a is an injective morphism, 

[ii) is a comma-free code, and 

(Hi) T has no insertion rule. 

For every u, v E E*, u v-^t v is equivalent to a{u) *-^a{T) 
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Proof. According to Lemma [21 u i— v implies a{u) >-^a{T) ct{v). Conversely, let us 
prove that a{u) *—-*-a{T) <y{v) implies u v. We rely on Lemma HI 

Let X, y E A* be such that x belongs to the range of a and x i — *a(T) V'- there exist 
z, z' G A* and G R such that a{x) = za{s)z' and y = za{t)z' . Since a is a 

morphism, the range of a equals In particular, both a{s) and za{s)z' belong to 

Furthermore, a{s) belongs to aiT)^: indeed, s is a non-empty word because T has 
no insertion rule, and thus its image under the injective morphism a is also a non-empty 
word. It follows that both z and z' belong to because a;(S) is a comma-free code: 

there exist z, z' G S* such that a;(z) = z and a(z') = z! . We can now write x and y 
in the forms x = a{zsz') and y = a{ztz'). Hence, y belongs to the range of a, which 
proves that the range of a is closed under derivation modulo a{T). Moreover, we also get 
a~^{x) = zsz' I — >-T ztz' = a^^{y). Therefore, Lemma H] applies with T := a{T). □ 

Let us thoroughly examine the hypotheses of Lemma [5l Hypothesis {in) could be 
replaced with "T has no deletion rule": apply Lemma [5] in its original form to the reversal 
of T, defined as T := (S, {(t, s) : (s, t) G R}). However, the following two counterexamples 
show that neither hypothese (ii) nor hypothese {Hi) is disposable. 

Counterexample 1. Let T := ({a, b} , {(a, aa)}) and let a : {a, b}* {0,1}* be 
the morphism defined by a(a) := 01 and a(h) := Oil: a(T) = ({0, 1} , {(01, 0101)}). 
Clearly, a is injective and T is e-free. However, a({a, b}) is not a comma-free code, and 

a{u) »— -*a(T) Ci{v) does not imply u i— v for every u, v E {a, b}*.- a(b) i — *a(T) tt(ab) 
it 

but b J, ab. 

Counterexample [1] disproves a claim from Claus's original paper [21 page 57, hne —4] . 
A statement from Harju, Karhumaki and Krob [H page 43, line 1] is disproved in the same 
way. 

Counterexample 2. Let T := ({a, b, c}, {(e, a), (b, e)}) and let a : {a, b, c}* {0,1}* 

be the morphism defined by a(a) := 101, a{h) := 1001 and a(c) := 10001.- a{T) = 

({0, 1} , {(e, 101), (1001, e)}). Clearly, a is injective and a({a, b, c}) is a comma-free code. 

However, T admits both insertion and deletion rules, and a{u) -^a(T) ct{v) does not imply 

^ it 
u I — >T V for every u, v E {a, b}*.- c "-T^y a but 

a(c) 1 — >• 10101001 I — »• 10101001011 i — > 1010011 i — >■ a(a) . 

a{T) a{T) a{T) a{T) 

Proof of PropositionUi We present a many-one reduction from AcCESSlBlLlTY(fc) on e- 
free instances to AcCESSlBlLlTY(fc) on Ck, so that Lemma [3] applies. 

Let {T,u,v) be an e-free instance of Accessibility(/c). Let S denote the alphabet 
of T. Compute an injection a : —>■ C. The morphism from to {0, 1} that extends 
a is also denoted a. Clearly, {a{T),a{u),a{v)) belongs to Ck, and {a{T),a{u),a{v)) is 
computable from {T,u,v). Moreover, u t-^q- v is equivalent to a{u) >-^a{T) ct{v) by 
Lemma [51 □ 
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3 From GPCP to Accessibility 

The key ingredient of the proof of Fact [2] is the accordion lemma (Lemma [7] below). 

Definition 7. A word f is called bordered if there exist three non-empty words u, v and 
X such that f = xu = vx . 

Equivalently, / is bordered if and only if there exists a word z with \z\ < 2 |/| such 
that / occurs twice or more in z. 

Definition 8. We say that two words x and y overlap if at least one of the following four 
assertions hold: 

(i) X occurs in y, 

(a) y occurs in x, 

{Hi) some non-empty prefix of x is a suffix of y, or 
(iv) some non-empty prefix of y is a suffix of x. 

Since we make the convention that the empty word occurs in every word, the empty 
word and any other word do overlap. Two non-empty words x and y overlap if and only if 
there exists a word z with \z\ < \x\ + \y\ such that both x and y occur in z. 

We can now state a protoversion of the accordion lemma. 

Lemma 6. Let T = (T,,R) be a semi-Thue system, and let f,u,vE S* be such that: 
(i) f is unbordered, 

(ii) f does not occur in u, 
{Hi) f does not occur in v, and 

(iv) for each rule {s, t) & R, s and f do not overlap. 

Then, u i-^^^ v holds if and only if there exist ?/ G S* satisfying both xfv = ufy and 
Proof, [only if). If u v then x := u and y := v are such that xfv = ufy and 

X i-^T y- 

(if). Assume that there exist x, y E Tj* such that xfv = ufy and x y. Let n 

denote the number of occurrences of / in xfv. Since / is unbordered (hypothesis (z)), 
those occurrences are pairwise non-overlapping: 

xfv = ufy = WofWifW2 ■ ■ ■ fWn 
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for some words Wq, Wi, . . . , Wn G S*. Since / does not occur in v (hypothesis (ui)), we 
have V = Wn and 

X = WofWifW2 ■ ■ -fWn-l . 

In the same way, / does not occur in u either (hypothesis {ii)), and thus we have also 
u = Wo and 

y = WifW2fW3 ■ ■ ■ fWn ■ 

Now, remark that / plays the role of a delimiter with respect to the derivation modulo T 
(hypothesis {iv))\ x i — *t U implies 

Wj-l Wi 
T 

for every i G Therefore, u i— v holds. 

□ 

Let us comment the statement of Lemma O Hypothesis [iv) implies that T has no 
insertion rule. It could be replaced with "/or every (s, t) E R, t and f do not overlap" . 
Hypothesis (ii) is in fact disposable: the verification is left to the reader. Let S be an 
alphabet, let / be a symbol such that / ^ E, and let T be a semi-Thue system over 
S U {/} with rules in S+ x S*. An easy consequence of Lemma [6] is that, for every 
M, f G S*, u H-^'p V holds if and only if there exist x, y E (S U {/})* satisfying both 
xfv = ufy and x i-^t- y. 

Definition 9. The word 0011 is denoted by f . 

Lemma 7 (Accordion lemma). Let k be a positive integer. For every {T,u,v) G Ck, 
{T,u,v) is a yes-instance of AcCESSlBlLlTY(fc) if and only if there exist x, y E {0, 1}* 
satisfying both xfv = ufy and x i— y. 

Proof. Clearly, / is unbordered, and for every s G C+, s and / do not overlap. Hence, 
Lemma M applies. □ 

The statement of the accordion lemma can be made precise as follows (the verification 
is left to the reader): for any (T,u,v) G Ck and any x, y E {O, 1}* such that xfv = ufy 
and X \-^rp y^ both x and y belong to {C U {/})*. Besides, if C and / were defined as 
C := {10"1 : n > 1} and / := 11 in Definitions [1] and [H then Proposition [1] and Lemma [7] 
would hold (the verification is left to the reader). In [Tj Theorem 4.1], Harju and Karhumaki 
present a proof of Claus's theorem that implicitly relies on those variants of Proposition [T] 
and Lemma [3 

Definition 10. An instance (S, cr, r, s, t, s', t') of GPCP is called erasement-free z/cr(S)U 
r(S)C{0,ir. 

We can now prove a slightly strengthened version of Fact [21 
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Theorem 1. Let k be a positive integer. If GPCP(/c + 2) is decidable on eras ement- free 
instances then AcCESSIBILITY(/i;) is decidable. 

Proof. In order to apply Proposition [H we present a many-one reduction from AcCESSl- 
bility(A;) on Cfc to GPCP(A; + 2) on erasement-free instances. 

Let {T,u,v) be an element of Ck- there exist Si, S2, ■ ■ ■ , Sk, ti, t2, . . . , t^ E C"*" such 
that T = ({0, 1} , {(si, ti), (S2, ta), • • • , (sfc, t^)}). 

Let ai, a2, . . . , cifc be symbols such that S := {0, 1, ai, 02, . . . , 0^} is an alphabet of 
cardinality k + 2. Let a, r : S* — > {0, 1}* be the morphisms defined by: 

a(0):=0, r(0):=0, 
a(l):=l, r(l):=l, 

a{ai) := Sj , r{ai) := ti 

for every i G [1, /c]. Let J denote the instance (S, cr, r, e, /f , m/, e) of GPCP(A; + 2). 

It is clear that J is erasement-free and that J is computable from /. It remains to 
check that J is a yes-instance of Accessibility(/c) if and only if J is a yes-instance of 
GPCP(A; + 2). The proof of the "if part" relies on the accordion lemma while the proof of 
the "only if part" relies on next lemma. 

Lemma 8. For any x, y & {0, 1}* such that x 1 — *t U, there exists 2; G S* such that 
X = a{z) and y = t{z). 

Proof. Let z', z" G {0, 1}* and let i G be such that x = z'siz" and y = z'tiz" . A 

suitable choice for z is z'aiz" . □ 

{only if). Assume that u \-^t v. There exist an integer n > and n+1 words xq, xi, . . . , 
Xn over {0, 1} such that Equation ([T]) holds. Lemma [H] ensures that there exists Zi G S* 
satisfying = a{zi) and Xi = T{zi) for each i G [l,n]. Now, w := Zifz2fz^---fzn is 
such that a{w)fv = ufriw). 

{if). Assume that there exists w G S* such that a{w)fv = ufT{w). The morphisms a and 
r are defined in such a way that a{z) i-^^ t{z) for every z G S*. In particular, x := a{w) 
and y := t{w) are such that xfv = ufy and x i-^j- y. Hence, Lemma [7] yields u i-^j- v. 

□ 

Combining Theorem [T] and [9, Theorem 4.1] we obtain: 
Corollary 1. GPCP(5) is undecidable on erasement-free instances. 



10 



4 From PCP to GPCP 



Definition 11. An instance (S, a, r, s, t, s', t') of GPCP is called {e,e)-bee if for every 
a e S, {a{a),T{a)) ^ {£,£)■ 

Lemma 9. For every integer k > 1, GPCP(/i;) is decidable if and only if the problem is 
decidable on {6,e)-free instances. 

Proof. We present a many-one reduction from GPCP(A;) to GPCP(A;) on {e,e)-bee in- 
stances. 

Let / := (S, cr, r, s, t, s' , t') be an instance of GPCP(A;). Compute the set S of all letters 
a G S such that (cr(a),r(a)) ^ {s,e). If S is empty then solving GPCP(A;) on / reduces 
to checking whether st and s't' are equal. Hence, we may assume S 7^ without loss 
of generality, taking out of the way cumbersome considerations. Let a and r denote the 
restrictions to of a and r, respectively. Let J denote the septuple (S, a, r, s, t, s', t'). 
Clearly, J is an (£:,e)-free instance of GPCP(/c) and J is computable from I. Moreover, I 
is a yes-instance of GPCP(fc) if and only if J is also a yes- instance of the problem. □ 

Remark that every erasement-free instance of GPCP is (e, £:)-free, but the converse is 
false in general. 

Definition 12. An instance (Zl,o", r) of PCP is called erasement-free if a (11) U t(S) C 



We can now prove Fact [3l 
Theorem 2. Let k be a positive integer, 
ii). If PCP(A; + 2) IS decidable then GPCP(A;) is decidable. 

[a). If PCP(A; + 2) is decidable on erasement-free instances then GPCP(/c) is decidable 
on erasement-free instances. 

Proof. We present a many-one reduction from GPCP(A;) on [e, £:)-free instances to PCP(/c+ 
2) in order to apply Lemma [9l 

Let / := (S, 0", r, s, t, s', t') be an (£:,£:)-free instance of GPCP(A;). Without loss of 
generality, we may assume b ^ S and e ^ S: S := E U {b, e} is an alphabet of cardinality 
+ 2. Let A := Ad and p := pa (see Definition H]). Let a, r : S* — >• {0, l,d, b, e}* be the 
two morphisms defined by: 



for every a G S. Let j : {0, 1, d, b, e}* {O, l}* denote an injective morphism: for instance 
J can be given by j(0) := 000, := 111, j(d) := 101, j(b) := 100 and j(e) := 001. 



{0,1} 



a(b) := bA(s) , 
a{e) := A(t)de, 
a(a) := A(cr(a)) , 



f(b) := bdp(s') 
f(e) :=p(t')e, 
f(a) := p(r(a)) 
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It is clear that J := (S, j o a, j o r) is an instance of PCP(A; + 2) computable from /, 
and that J is erasement-free whenever I is erasement-free. Hence, to prove both points (i) 
and [a) of Theorem m it remains to check that / is a yes-instance of GPCP(fc) if and only 
if J is a yes-instance of PCP(A; + 2). 

Lemma 10. For every w G S*, sa{w)t = s'T{w)t' if and only ifa{hwe) = r(bwe). 
Proof. Straightforward computations yield 

a{hwe) = a{h)a{w)a{e) = hX{s)X{a{w))X(t)de = hX{sa{w)t)de 

and 

r(bwe) = r(b)r(tL')r(e) = bdp(s')p(r(w))p(t')de = bdp(sV(tL')t')e . 

Since A(x)d = dp{x) for every x G {0, 1}*, sa{w)t = s'T{w)t' implies a{bwe) = r(hwe), 
and furthermore, a{hwe) = T{hwe) implies X{sa{w)t) = X{s'T{w)t'). Since A is trivially 
injective, a{hwe) = r(bwe) implies sa{w)t = s'T{w)t' . □ 

If / is a yes instance of GPCP(fc) then it follows from Lemma fTOl that J is a yes-instance 
of PCP(fc -|- 2). The converse is slightly more complicated to prove. 

Lemma 11. For every w G S*, the following three assertions are equivalent: 

1. a{we) is a prefix ofT{we), 

2. r(we) is a prefix ofa(we), and 

3. a{we) = T{we). 

Proof. The letter e occurs once in a(e) (resp. T(e)) whereas for every a G S U {b}, e 
does not occur at all in a{a) (resp. T{a)). Therefore, = l^l^ = |T(2;)|g holds for 

every x G S*. Since e is the last letter of a(e), any proper prefix of a{we) contains less 
occurrences of e than r(we). From that we deduce that r(we) cannot be a proper prefix 
of a{we). In the same way, a (we) cannot be a proper prefix of r(we). □ 

Lemma 12. For every w G T.* , the following three assertions are equivalent: 

1. a{hw) is a suffix ofrihw), 

2. T{hw) is a suffix ofa{hw), and 

3. a{hw) = T{hw). 

Proof. Lemma [T5] is proved in the same way as Lemma [TTl The details are left to the 
reader. □ 

Claim 1. Let a E be such that a{a) ^ e. 

{{). The first letter ofa{a) is either h or d. 
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(ii). The last letter ofa{a) is distinct from d. 
Claim 2. Let a E be such that T(a) ^ e. 
{i). The first letter ofr^a) is distinct from d. 
{ii). The last letter ofr^a) is either d or e. 

Assume that J is a yes-instance of PCP(A; + 2). Let w G S+ be sucli tliat d{w) = t{w). 
Let X denote both words a{w) and t{w). 

Since / is an {e,e)-bee instance of GPCP, (a{a),T{a)) is distinct from {e,e) for every 
a G S, and thus x is a non-empty word. Combining Claims [I](2) and[2](z), we obtain that b 
is the first letter of x, and thus b is also the first letter of w. In the same way, combining 
Claims [T]^2z) and^ii), we obtain that e is the last letter of x, and thus e is also the last 
letter of w. Hence, w is of the form hw'e with G S*. 

Now, assume that w is a shortest non-empty word over S such that a{w) = t{w). 
Let us check that w' G S*. By the way of contradiction suppose that e occurs in w': 
there exist wi, W2 G such that w' = Wiew2- Straightforward computations yield 
a{hwie)a{w2e) = x = r(bwie)r(w2e). Therefore, a{hwie) is a prefix of r(bwie) or r(bwie) 
is a prefix ofa{hwie). From Lemma [TT| we deduce that a(bwie) = rihwie). Since hwis is 
shorter than w, a contradiction follows. Hence e does not occur in w'. Similar arguments 
based on Lemma [T^ show that b does not occur in w' either. 

Hence, w' is a word over S, and thus Lemma [TO] ensures that scr{w')t = s'T{w')t'. It 
follows that / is a yes-instance of GPCP (A;). □ 

Strictly speaking, the correspondence problem that was originally introduced by Post 
in his 1946 paper [10] is, in our terminology, the restriction of PCP to erasement-free 
instances. 

Combining Theorems [1] and [2]^ii), we obtain a slightly strengthened version of Claus's 
theorem (FactHj). 

Corollary 2. Let k he a positive integer. If PCP(/i; + 4) is decidable on erasement-free 
instances then Accessibility(A;) is decidable. 

Combining Corollary [2] and p^, Theorem 4.1] we obtain: 

Corollary 3. PCP (7) is undecidable on erasement-free instances. 
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